Building a Pet Store Operator

Learn to build an operator from scratch and extend the Kubernetes API.

We'll cover the following

In this lesson, we will build on the background information of CRDs, operators, and controllers to implement our own operator. This operator will have only one CRD, Pet, and only one controller to reconcile those Pet resources. The desired state of Pet will be reconciled to our pet store service, which we used in previous sections.

This will be an example of using Kubernetes control loops to reconcile the state of a resource that has no dependency on other resources within Kubernetes. Remember, we can model anything in CRDs and use Kubernetes as a tool for building robust APIs for any type of resource.

In this lesson, we will learn to build an operator from scratch. We will define a new CRD and controller. We will examine the build tools and the different code generation tools used to eliminate the majority of boilerplate code. We will deploy our controller and the pet store service to a local kind cluster and learn how to use Tilt.dev for faster inner-loop development cycles.

svg viewer

Initializing the new operator#

In this lesson, we will initialize the new operator using the operator-sdk command-line tool. This will be used to scaffold out a project structure for our operator:

Initializing new operator using the operator-sdk

With the preceding commands, operator-sdk will scaffold a new operator project using an example domain, which will form the suffix of the group name for our future CRDs. The –repo flag is based on the repo for the book's code, but we would want that to reflect the repo path for our project or omit it and allow it to default. Let's see what is in the repo after scaffolding:

Output of the command ls -al

The preceding listing shows the top-level structure of the project. The Dockerfile contains commands to build the controller image. The Makefile contains a variety of helpful tasks; however, we will not use it much in this walk-through. The PROJECT file contains metadata about the operator. The config directory contains the manifests needed to describe and deploy the operator and CRDs to Kubernetes. The hack directory contains a boilerplate license header that will be added to generated files and is a good place to put helpful development or build scripts. The rest of the files are just regular Go application codes.

Now that we have a general idea of what was scaffolded for us, we can move on to generating our Pet resources and controller:

Generating Pet resources and controller

By executing the preceding commands, we've instructed operator-sdk to create a new API in the petstore group with the v1alpha1 version of the Pet kind and generate both the CRD and the controller for the type. Note that the command created api/v1alpha1/pet_types.go and controllers/pet_controller.go, and then ran make generate and make manifests. Shortly, we will see that code comments in both of the Go files cause make generate and make manifests to generate CRD manifests as well as Kubernetes' Role-Based Authorization Controls (RBAC) for the controller. The RBAC entries for the operator will give rights to the controller to perform CRUD operations on the newly generated resource. The CRD manifest will contain the schema for our newly created resource.

Next, let's take a quick look at the files that have changed with the command git status:

Expected output of git status

As we can see, there are quite a few changes to the files. We will not go into depth on each of the changes. The most notable is the generation of config/crd/bases/petstore.example.com_pets.yaml, which contains the CRD for our Pet resource. In operator projects, it is common to describe the resources in the API in the api/ directory, the Kubernetes manifests under config/, and the controllers under controllers/.

Next, let's see what has been generated in api/v1alpha1/pet_types.go:

The generated pet_types.go file

The preceding code shows a snippet from the pet_types.go file. The create api command has generated a Pet resource with spec and status. The PetSpec contains one field named Foo, which will serialize with the key foo and is optional to provide when creating or updating the resource. status contains nothing.

Note the comments in the file. They instruct us that this is the place to add new fields to the type and to run make after we do to ensure that the CRD manifests are updated in the config/ directory.

Now, let's look at the rest of the file:

The pet_types.go file continued

Here, we can see the definition of Pet and PetList, which both get registered in the following schema builder. Note the //+kubebuilder build comments. These build comments instruct kubebuilder on how to generate the CRD manifests.

Note that Pet has the spec and status defined with the json tags that we have seen in the other Kubernetes resources we have worked with. Pet also includes both TypeMeta, which informs Kubernetes of the group version kind information, and ObjectMeta, which contains the name, namespace, and other metadata about the resource.

With these structures, we already have a fully functional custom resource. However, the resource doesn't represent the fields we want to represent our pet resource and will need to be updated to better represent our pet structure.

Next, let's look at what was generated for PetReconciler in controllers/pet_controller.go, the controller that will run the control loop for reconciling pets:

The controller for reconciling pets

In the preceding code, we can see a PetReconciler type that embeds a client.Client, which is a generic Kubernetes API client, and *runtime.Scheme, which contains the known types and the schemas registered. If we continue downward, we can see a collection of //+kubebuilder: rbac build comments that instruct the code generator to create RBAC rights for the controller to be able to manipulate the Pet resource. Next, we can see the Reconcile func, which will be called each time a resource has been changed and needs to be reconciled with the pet store. Finally, we can see the SetupWithManager function, which is called from main.go to start the controller and inform it and the manager what kind of resource the controller will reconcile.

We have covered the impactful changes from the scaffolding process. We can proceed to implement our Pet resource to reflect the domain model we have in the pet store. The pet entity in our pet store has three mutable, required properties, Name, Type, and Birthday, and one read-only property, ID. We need to add these to our Pet resource to expose them to the API:

The changes made to the Pet resource

The preceding are the code changes we've made to Pet to reflect the domain model of
the pet store service. Note // +kubebuilder:validation:Enum preceding the PetType type. That indicates to the CRD manifest generator that the schema should add validation to ensure only those strings can be supplied for the Type field of PetSpec. Also, note that each of the fields in spec does not have the omitempty JSON tag. That will inform the CRD manifest generator that those fields are required.

The status of Pet has only an ID field, which is allowed to be empty. This will store the unique identifier returned from the pet store service.

Now that we have defined our Pet, let's reconcile pet with the pet store in the controller loop:

The function to Reconcile pet

The preceding code has been added to reconcile the pet resource. When we receive a change from the API server, we are not given much information. We are only provided with NamespacedName of the pet. NamespacedName contains both the namespace and the name of the pet that has changed. Remember that PetReconciler has a client.Client embedded on it. It provides us with access to the Kubernetes API server. We use the Get method to request the pet we need to reconcile. If the pet is not found, we return an empty reconcile result and a nil error. This informs the controller to wait for another change to occur. If there is an error making the request, we return an empty reconcile result and an error. If the error is not nil, the reconciler will try again and back off exponentially.

If we are able to fetch the pet, we then create a patch helper, which will allow us to track changes to the Pet resource during the reconciliation loop and patch the resource change back to the Kubernetes API server at the end of the reconcile loop. The defer ensures that we patch at the end of the Reconcile func.

If the pet has no deletion timestamp set, then we know that Kubernetes has not marked the resource for deletion, so we call ReconcileNormal, where we will attempt to persist the desired state to the pet store. Otherwise, we call ReconcileDelete to delete the pet from the pet store.

Let's next look at ReconcileNormal and understand what we do when we have a state change to a non-deleted pet resource:

Function that make sure Petfinalizer is added to the resource

In ReconcileNormal, we always make sure that PetFinalizer has been added to the resource. Finalizers are the way that Kubernetes knows when it can garbage-collect a resource. If a resource still has a finalizer on it, then Kubernetes will not delete the resource. Finalizers are useful in controllers when a resource has some external resource that needs to be cleaned up prior to deletion. In this case, we need to remove Pet from the pet store prior to the Kubernetes Pet resource being deleted. If we didn't, we might have pets in the pet store that don't ever get deleted.

After we set the finalizer, we build a pet store client. We won't go into more detail here, but suffice it to say that it builds a gRPC client for the pet store service. With the pet store client, we query for the pet in the store. If we can't find the pet, then we create one in the store; otherwise, we update the pet in the store to reflect the desired state specified in the Kubernetes Pet resource.

Let's take a quick look at the createPetInStore func:

Function to create a new pet

When we create the pet in the pet store, we call AddPets on the gRPC client with the Kubernetes Pet resource desired state and record ID in the Kubernetes Pet resource status.

Let's move on to the updatePetInStore func:

Function to update the pet

When we update the pet in store, we use the fetched store pet and update the fields with the desired state from the Kubernetes Pet resource.

If, at any point in the flow, we run into an error, we bubble up the error to Reconcile, where it will trigger a re-queue of the reconciliation loop, backing off exponentially. The actions in ReconcileNormal are idempotent. They can repeatedly run to achieve the same state and, in the face of errors, will retry. Reconciliation loops can be pretty resilient to failures.

That's about it for ReconcileNormal. Let's look at what happens in ReconcileDelete:

Function to delete a pet in the petstore

In ReconcileDelete in the preceding code block, we get a pet store client to interact with the pet store. If pet.Status.ID is not empty, we attempt to delete the pet from the pet store. If that operation is successful, we will remove the finalizer, informing Kubernetes that it can then delete the resource.

We have extended Kubernetes and created our first CRD and controller! Let's give it a run.

To start the project and see our Kubernetes operator in action, we use the following commands:

/
petstore-operator
go.mod
Dockerfile
Makefile
Tiltfile
go.sum
PROJECT
.dockerignore
.gitignore
main.go
client
controllers
api
hack
config

Run the following commands:

The preceding commands will create a kind cluster and a local Open Container Initiative (OCI) image registry, enabling us to publish images locally rather than to an external registry. After running these commands, Tilt will start at the command line. Press the spacebar to open the web view of Tilt.dev. After pressing space, there will be an error which we can safely ignore. Click the link below the “Run” button. Once we do, we should see something like the following:

Tilt's All Resources web view
Tilt's All Resources web view

Wait for each of the services on the left panel to turn green. Once they are, it means that the pet store operator and Service have deployed successfully. If we click on one of the Services listed on the left, it will show us the log output for that component. petstore-operator-controller-manager is our Kubernetes controller. Next, we are going to apply some pets to our Kubernetes cluster and see what happens.

Let's first look at the pet samples we are going to apply. The samples are in config/samples/petstore_v1alpha1_pet.yaml:

The pet samples to apply

We have two pets, Thor and Tron. We can apply them by opening a new terminal (by clicking the "+" icon at the top of the terminal) and entering the following commands:

That should have replied that they were created, and we should then be able to fetch them with the following command:

We can see that we have two pets defined. Let's make sure they have IDs. Run this command:

Tron has an ID generated from the pet store service; it was applied to the Kubernetes Pet resource status.

Now, we can test our reconciliation loop by changing the name of Thor to Thorbert:

This will open our default editor. We can go and change the value of Thor to Thorbert to cause a new reconcile loop.

We should see something similar to this output in our browser, with Tilt in the pet store operator logs:

Changing the values of the pets

As we can see from the preceding code, Thor is now changed to Thorbert.

Finally, we can delete these pets with the following command:

After deleting the resources, we should be able to check back in Tilt and see the log output reflecting that the delete operations succeeded.

In this lesson, we learned to build an operator from scratch, extended the Kubernetes API with a custom resource that reconciled state to an external Service, and used some really useful tools along the way.

Extending Kubernetes With Custom Resources and Operators

Summary and Quiz on Applications in Kubernetes